Overview Visualizing in higher dimension space can be messy and unintuitive (Euclidean space, \(\mathbb{R}^p,~~p>3\), where p are numeric variables). Analysis of higher dimensions must be interpretable in terms of the original dimensions and minimizes the loss of the information held in the data.
To these ends we advise the use of projection pursuit as acheived in the R package tourr(2011, H Wickham & D Cook). Furthere, we impliment a method for manual controls following D. Cook, & A. Buja (1997) in an R package spinifex, currently available with devtools::install_github("nspyrison/spinifex"). We also compare and contrast alternative methodolgy; namely Principal Component Analysis (PCA, 1901 K. Person), t-distributed Stochastic Neighbor Embedding (t-SNE, 2008 L van derMaaten & G Hinton), and holes ompimized tour (an application of projection pursuit, 1974 J Friedman & J Tukey). Grand Tour purposed D Asimov (1985).
The R package, tourr (2011, H Wickham & D Cook), gives a means to animate 2-d projections of rotated p-dimensional data object. The path of rotation may take the form of a random walk, predefined path, or optimizing an index by (“semi-”stochastic) gradient descent (Projection Pursuit, described above).
\(Work~in~progress,~~TODO:~add~to,~cleanup\)
Thanks
Prof. Dianne Cook - Guidance, inspiration, and contributions to projection pursuit
Dr. Ursula Laa - Collaboration, use cases, and development feedback
References
H. Wickham, D. Cook, H. Hofmann, and A. Buja (2011). tourr: An r package for exploring multivariate data with projections. Journal of Statistical Software 40(2), http://www.jstatsoft.org/v40.
D. Asimov (1985). The grand tour: a tool for viewing multidimensional data. SIAM Journal on Scientific and Statistical Computing, 6(1), 128–143.
D. Cook, & A. Buja (1997). Manual Controls for High-Dimensional Data Projections. Journal of Computational and Graphical Statistics, 6(4), 464–480. https://doi.org/10.2307/1390747
H. Wickham, D. Cook, and H. Hofmann (2015). Visualising statistical models: Removing the blindfold (withdiscussion). Statistical Analysis and Data Mining 8(4), 203–225.
Other reading
74 obs x 6 var of physical measurements taken across 3 different species of flea-beetles. Methods are unsupervized, but data are colored according to species.
\(TODO:~scale~output~of~spinifex::proj\_data(),~case~handling~for~spinifex::slideshow(),~apply~Phys~data.\)
\(TODO:~FIX~SPINIFEX~HERE\)74 obs x 6 var of physical measurements taken across 3 different species of flea-beetles. Methods are unsupervized, but data are colored according to species.
p ordered linear combinations of p dimensionsp unordered non-linear combinations of p dimensions| Method | Interpretable | MaxVarRetention | GlobalOptimia | CannotOverfit | NonLinearData |
|---|---|---|---|---|---|
| PCA | TRUE | FALSE | TRUE | TRUE | FALSE |
| t-SNE | FALSE | NA | FALSE | FALSE | TRUE |
| Tour, holes | TRUE | TRUE | FALSE | TRUE | FALSE |
f.pca <- stats::prcomp(flea)
ggplot2::ggplot(f.pca) + ...
f.tsne <- Rtsne(f, ...)
f.tsne.pca <- stats::prcomp(f.tsne)
ggplot2::ggplot(f.tsne.pca) + ...
f.holes_end <- tourr::animate_xy(flea, guided_tour(index = holes))
ggplot2::ggplot(f.holes_end) + ...